Incomplete Directed Perfect Phylogeny

نویسندگان

  • Itsik Pe'er
  • Ron Shamir
  • Roded Sharan
چکیده

Perfect phylogeny is one of the fundamental models for studying evolution. We investigate the following variant of the problem: The input is an n ×m species-characters matrix. The characters are binary and directed, i.e., a species can only gain characters. The difference from standard perfect phylogeny is that for some species the state of some characters is unknown. The question is whether one can complete the missing states in a way admitting a perfect phylogeny. We call this problem Incomplete Directed Perfect phylogeny (IDP). The problem arises in classical phylogenetic studies, when some states are missing or undetermined. Swofford’s PAUP software package [4] provides an exponential solution to the problem by exhaustive search. Quite recently, a novel kind of genomic data has given rise to the same problem: Nikaido et al. [3] use inserted repetitive genomic elements, particularly SINEs, as a source of evolutionary information. The specific insertion events are identifiable by the flanking sequences on both sides of the insertion site. These insertions are assumed to be unique, irreversible events in evolution. However, the site and its flanking sequences may be lost when a large region of the genome which includes them is deleted. In that case we do not know whether an insertion had occurred in the missing site. One can model such data by assigning each locus a character, whose state is ‘1’ if the SINE occurred in that locus, ‘0’ if the locus is present but does not contain the SINE, and ‘?’ if the locus is missing. An example of the problem input and solution is given in Figure 1.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The binary perfect phylogeny with persistent characters

The binary perfect phylogeny model is too restrictive to model biological events such as back mutations. In this paper we consider a natural generalization of the model that allows a special type of back mutation. We investigate the problem of reconstructing a near perfect phylogeny over a binary set of characters where characters are persistent: characters can be gained and lost at most once. ...

متن کامل

Efficient Enumeration of the Directed Binary Perfect Phylogenies from Incomplete Data

We study a character-based phylogeny reconstruction problem when an incomplete set of data is given. More specifically, we consider the situation under the directed perfect phylogeny assumption with binary characters in which for some species the states of some characters are missing. Our main object is to give an efficient algorithm to enumerate (or list) all perfect phylogenies that can be ob...

متن کامل

On the Generality of Phylogenies from Incomplete Directed Characters

We study a problem that arises in computational biology, when wishing to reconstruct the phylogeny of a set of species. In Incomplete Directed Perfect Phylogeny (IDP), the characters are binary and directed (i.e., species can only gain characters), and the states of some characters are unknown. The goal is to complete the missing states in a way consistent with a perfect phylogenetic tree. This...

متن کامل

Haplotype Block Partitioning and tagSNP Selection under the Perfect Phylogeny Model

Single Nucleotide Polymorphisms (SNPs) are the most usual form of polymorphism in human genome.Analyses of genetic variations have revealed that individual genomes share common SNP-haplotypes. Theparticular pattern of these common variations forms a block-like structure on human genome. In this work,we develop a new method based on the Perfect Phylogeny Model to identify haplo...

متن کامل

Influence of Tree Topology Restrictions on the Complexity of Haplotyping with Missing Data

Haplotyping, also known as haplotype phase prediction, is the problem of predicting likely haplotypes based on genotype data. One fast haplotyping method is based on an evolutionary model where a perfect phylogenetic tree is sought that explains the observed data. Unfortunately, when data entries are missing, as is often the case in real laboratory data, the resulting formal problem IPPH, which...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • SIAM J. Comput.

دوره 33  شماره 

صفحات  -

تاریخ انتشار 2000